25 research outputs found
Reward Collapse in Aligning Large Language Models
The extraordinary capabilities of large language models (LLMs) such as
ChatGPT and GPT-4 are in part unleashed by aligning them with reward models
that are trained on human preferences, which are often represented as rankings
of responses to prompts. In this paper, we document the phenomenon of
\textit{reward collapse}, an empirical observation where the prevailing
ranking-based approach results in an \textit{identical} reward distribution
\textit{regardless} of the prompts during the terminal phase of training. This
outcome is undesirable as open-ended prompts like ``write a short story about
your best friend'' should yield a continuous range of rewards for their
completions, while specific prompts like ``what is the capital of New Zealand''
should generate either high or low rewards. Our theoretical investigation
reveals that reward collapse is primarily due to the insufficiency of the
ranking-based objective function to incorporate prompt-related information
during optimization. This insight allows us to derive closed-form expressions
for the reward distribution associated with a set of utility functions in an
asymptotic regime. To overcome reward collapse, we introduce a prompt-aware
optimization scheme that provably admits a prompt-dependent reward distribution
within the interpolating regime. Our experimental results suggest that our
proposed prompt-aware utility functions significantly alleviate reward collapse
during the training of reward models
REST: Retrieval-Based Speculative Decoding
We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm
designed to speed up language model generation. The key insight driving the
development of REST is the observation that the process of text generation
often includes certain common phases and patterns. Unlike previous methods that
rely on a draft language model for speculative decoding, REST harnesses the
power of retrieval to generate draft tokens. This method draws from the
reservoir of existing knowledge, retrieving and employing relevant tokens based
on the current context. Its plug-and-play nature allows for seamless
integration and acceleration of any language models, all without necessitating
additional training. When benchmarked on 7B and 13B language models in a
single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on
code or text generation. The code of REST is available at
https://github.com/FasterDecoding/REST
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond
We consider vertical logistic regression (VLR) trained with mini-batch
gradient descent -- a setting which has attracted growing interest among
industries and proven to be useful in a wide range of applications including
finance and medical research. We provide a comprehensive and rigorous privacy
analysis of VLR in a class of open-source Federated Learning frameworks, where
the protocols might differ between one another, yet a procedure of obtaining
local gradients is implicitly shared. We first consider the honest-but-curious
threat model, in which the detailed implementation of protocol is neglected and
only the shared procedure is assumed, which we abstract as an oracle. We find
that even under this general setting, single-dimension feature and label can
still be recovered from the other party under suitable constraints of batch
size, thus demonstrating the potential vulnerability of all frameworks
following the same philosophy. Then we look into a popular instantiation of the
protocol based on Homomorphic Encryption (HE). We propose an active attack that
significantly weaken the constraints on batch size in the previous analysis via
generating and compressing auxiliary ciphertext. To address the privacy leakage
within the HE-based protocol, we develop a simple-yet-effective countermeasure
based on Differential Privacy (DP), and provide both utility and privacy
guarantees for the updated algorithm. Finally, we empirically verify the
effectiveness of our attack and defense on benchmark datasets. Altogether, our
findings suggest that all vertical federated learning frameworks that solely
depend on HE might contain severe privacy risks, and DP, which has already
demonstrated its power in horizontal federated learning, can also play a
crucial role in the vertical setting, especially when coupled with HE or secure
multi-party computation (MPC) techniques
SSLRec: A Self-Supervised Learning Framework for Recommendation
Self-supervised learning (SSL) has gained significant interest in recent
years as a solution to address the challenges posed by sparse and noisy data in
recommender systems. Despite the growing number of SSL algorithms designed to
provide state-of-the-art performance in various recommendation scenarios (e.g.,
graph collaborative filtering, sequential recommendation, social
recommendation, KG-enhanced recommendation), there is still a lack of unified
frameworks that integrate recommendation algorithms across different domains.
Such a framework could serve as the cornerstone for self-supervised
recommendation algorithms, unifying the validation of existing methods and
driving the design of new ones. To address this gap, we introduce SSLRec, a
novel benchmark platform that provides a standardized, flexible, and
comprehensive framework for evaluating various SSL-enhanced recommenders. The
SSLRec framework features a modular architecture that allows users to easily
evaluate state-of-the-art models and a complete set of data augmentation and
self-supervised toolkits to help create SSL recommendation models with specific
needs. Furthermore, SSLRec simplifies the process of training and evaluating
different recommendation models with consistent and fair settings. Our SSLRec
platform covers a comprehensive set of state-of-the-art SSL-enhanced
recommendation models across different scenarios, enabling researchers to
evaluate these cutting-edge models and drive further innovation in the field.
Our implemented SSLRec framework is available at the source code repository
https://github.com/HKUDS/SSLRec.Comment: Published as a WSDM'24 full paper (oral presentation